Fused Cross Entropy Triton - Loss Scaling and Vanishing Grads Bugfix #336

sarthak-amd · 2025-10-16T12:30:00Z

Description

The Fused Cross Entropy Triton Kernel currently has 2 bugs

if ignore_idx is not None`, the loss should be computed only over valid tokens and not all tokens (new fix)
gradient scaling when reduce_loss=False. (This is already fixed in upstream)
- If reduced loss=False, we should compute per token loss and not reduce it else it would shrink the gradients by 1/N giving wrong (higher) loss.
- if reduce_loss=False, grad_output is a tensor, not a scalar. We need to load 1 value per row instead of just a scalar.

This fix is validated on Llama3.1 8B model for Pre-training.

Type of change

Bug fix (non-breaking change which fixes an issue)

wenchenvincent · 2025-10-19T01:42:09Z

@sarthak-amd Could you post the PR for the upstream fix?

wenchenvincent · 2025-10-19T01:43:37Z

transformer_engine/pytorch/cross_entropy.py

@@ -1,3 +1,5 @@
+# This file was modified for portability to AMDGPU


There is no real change in this file. Let's keep this file intact and then we don't need to add the AMD copyright statement.

sarthak-amd · 2025-10-21T08:02:46Z

@sarthak-amd Could you post the PR for the upstream fix?

NVIDIA/TransformerEngine@e9a5fa4 @wenchenvincent

…tropy_loss

wenchenvincent · 2025-10-31T03:36:44Z

@sarthak-amd Could you post the PR for the upstream fix?

NVIDIA/TransformerEngine@e9a5fa4 @wenchenvincent

Another fix came from the upstream PR NVIDIA/TransformerEngine#1879. Is the change of test in that PR also reflected?

wenchenvincent · 2025-10-31T03:55:53Z

For the fix for ignore_idx, is there a test for it (without the fix, the test would fail)?

wenchenvincent

@sarthak-amd Could you refactor the PR as 3 commits:

2 commits would be cherrypicking from the upstream PRs.
1 commit for the ignore_idx with a test to cover it.

This way the PR would be very clear and easy to understand.

wenchenvincent · 2025-11-11T06:04:00Z

@sarthak-amd Could you address the comments? Also, please rebase upon latest dev so that hot fixes for sgpu tests could pass.

wenchenvincent · 2026-01-08T02:30:45Z

@sarthak-amd Could you rebase the PR and update it per reviewer comments?

loss scaling + ignore index fix

a4843f0

sarthak-amd marked this pull request as ready for review October 16, 2025 12:31

sarthak-amd requested review from ipanfilo, wangye805 and wenchenvincent as code owners October 16, 2025 12:31

sarthak-amd changed the title ~~Loss Scaling and Vanishing Grads~~ Fused Cross Entropy Triton - Loss Scaling and Vanishing Grads Bugfix Oct 16, 2025

wenchenvincent reviewed Oct 19, 2025

View reviewed changes

sarthak-amd added 3 commits October 22, 2025 03:41

udpate autograd wrapper for Fused CE

364f290

\Merge remote-tracking branch 'origin/dev' into hotfix/fused_cross_en…

5c1cc61

…tropy_loss

bring in upstream testcase update for CE

6da832f

wenchenvincent requested changes Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fused Cross Entropy Triton - Loss Scaling and Vanishing Grads Bugfix #336

Fused Cross Entropy Triton - Loss Scaling and Vanishing Grads Bugfix #336

Uh oh!

sarthak-amd commented Oct 16, 2025 •

edited

Loading

Uh oh!

wenchenvincent commented Oct 19, 2025

Uh oh!

wenchenvincent Oct 19, 2025

Uh oh!

sarthak-amd commented Oct 21, 2025

Uh oh!

wenchenvincent commented Oct 31, 2025

Uh oh!

wenchenvincent commented Oct 31, 2025

Uh oh!

wenchenvincent left a comment

Uh oh!

wenchenvincent commented Nov 11, 2025

Uh oh!

wenchenvincent commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1,3 +1,5 @@
		# This file was modified for portability to AMDGPU

Fused Cross Entropy Triton - Loss Scaling and Vanishing Grads Bugfix #336

Are you sure you want to change the base?

Fused Cross Entropy Triton - Loss Scaling and Vanishing Grads Bugfix #336

Uh oh!

Conversation

sarthak-amd commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Uh oh!

wenchenvincent commented Oct 19, 2025

Uh oh!

wenchenvincent Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

sarthak-amd commented Oct 21, 2025

Uh oh!

wenchenvincent commented Oct 31, 2025

Uh oh!

wenchenvincent commented Oct 31, 2025

Uh oh!

wenchenvincent left a comment

Choose a reason for hiding this comment

Uh oh!

wenchenvincent commented Nov 11, 2025

Uh oh!

wenchenvincent commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sarthak-amd commented Oct 16, 2025 •

edited

Loading